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A.  INTRODUCTION 

Traditional  approaches  to  training  on  actual  equipment  are 
becoming  more  and  more  prohibitive  because  of  relatively  high 
cost  and  their  limited  ability  to  be  used  for  training  on  unusual 
or  potentially  catastrophic  situations.  Simulators  are  used  to 
cope  both  with  increasing  costs  and  limitations  on  training 
effectiveness.  Spangenberg  (1976)  discussed  seven  unique 
advantages  of  using  simulators  for  training.  Simulators  can 
(1)  provide  immediate  feedback,  (2)  increase  the  number  of  system 
malfunctions  and  emergencies  to  provide  the  trainee  with 
experience  which  would  be  unavailable  on  actual  equipment, 
(3)  compress  time  so  a  complex  sequence  of  tasks  may  be 
accomplished  in  the  time  it  would  take  to  run  through  only  one  or 
two  tasks  on  the  actual  equipment,  (4)  vary  the  sequence  of  tasks 
to  maximize  training  efficiency,  (5)  provide  guidance  and 
stimulus  support  to  the  trainee  in  the  form  of  prompts  and 
feedback,  (6)  vary  the  difficulty  level  to  match  the  skill  level 
of  each  individual  trainee,  and  (7)  provide  the  trainee  with  an 
overview  from  which  the  trainee  may  form  an  overall  understanding 
of  the  whole  situation.  These  advantages,  in  addition  to  the 
potential  cost-effectiveness  are  the  reasons  why  simulators  have 
been  widely  used. 

Simulators  take  various  forms.  These  include  mock-ups, 
photographic  mimics,  and  computer  graphics.  Usually  they  are 


less  expensive  than  real  systems.  However,  large  mock-ups  like 


those  used  to  train  pilots  are  expensive.  The  cost  of  a 
simulator  usually  increases  with  the  fidelity  of  the  simulator 
even  though  increased  fidelity  does  not  guarantee  better 
training.  Several  terms  used  frequently  in  this  area  are  defined 
below  and  are  followed  by  a  general  overview  of  this  report. 


DEFINITIONS 

A  "simulator"  is  a  device  or  a  facility  which  represents  a 
machine,  system,  or  environment  and  its  functions  (Gerathewohl 
1969) .  Simulators  have  been  widely  used  to  train  operators  for 
maintenance,  normal  operations,  problem-solving,  and  decision 
making.  Simulators  have  been  constructed  for  a  variety  of 
applications.  Clymer  (1980)  identified  at  least  eight  different 
types  :  aircraft,  aerospace,  marine,  ground  vehicle,  traffic, 
process  plant,  power  plant,  and  manufacturing  plant.  The  term 
"training  device",  and  "trainer"  are  very  often  used  to  mean  the 
same  thing  as  simulator,  although  some  slight  differences  can  be 
distinguished  between  them  (Gagne  1954) . 

"Fidelity"  and  "realism"  are  terms  used  frequently  in  the 
simulation  and  training  community.  However,  their  definitions 
are  not  clearly  stated.  A  more  comprehensive  discussion  of 
fidelity  will  be  presented  later  in  this  report.  For  now  it  is 
sufficient  to  note  that  fidelity  or  realism  refers  to  the  degree 
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to  which  a  device  or  a  facility  accurately  simulates  a  machine  or 
a  system.  It  is  generally  believed  that  high  fidelity  training 
devices  cost  more  than  low  fidelity  devices.  A  simulator  that 
incorporates  only  those  features  that  are  necessary  to  train  for 
a  given  task  has  the  highest  potential  for  cost-effectiveness. 

Suitable  means  must  be  devised  to  evaluate  the  effectiveness 
of  training  programs.  The  extent  to  which  a  given  simulator 
facilitates  the  acquisition  of  appropriate  skills  by  the  trainees 
is  characterized  by  "transfer  of  training"  from  the  training 
devices  to  actual  equipment,  "training  effect"  or  "training 
effectiveness".  These  terms  are  also  used  to  describe  the 
effectiveness  of  a  training  program  which  may  or  may  not  include 
a  simulator.  In  this  report  these  terms  will  be  used  primarily 
for  the  former  case.  Conventionally,  simulators  have  been 
employed  in  training  with  the  assumption  that  higher  fidelity 
produces  a  better  transfer  effect.  However,  research  that 
contradicts  this  assumption  has  also  been  reported  during  the 
past  few  years  (see  Section  D) . 

OVERVIEW 

Simulator  training  is  only  one  option  for  a  training 
program.  Other  available  options  include  classroom  lectures, 
books  and  manuals,  slides,  movies,  demonstrations,  practice  on 
real  equipment,  and  on-the-job  training.  Whatever  options  are 
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used,  they  are  all  intended  to  facilitate  the  human  learning 
process.  Therefore,  Hennessy  (1981)  comments  that  most  problems 
associated  with  simulator  training  are  not  any  different  from 
those  in  other  types  of  training.  What  makes  the  simulator 
training  unique  are  the  complexity  of  the  equipment  involved  and 
the  cost  of  simulators. 

Several  factors  can  affect  the  effectiveness  of  simulator 
training.  These  include  instructors'  roles,  user  acceptance, 
management  support,  student  characteristics,  simulator  fidelity, 
training  strategy,  training  time,  and  pretraining  knowledge. 
Among  these  factors,  only  simulator  fidelity  will  be  covered  in 


this  report.  This  does  not  imply  that  the  other  factors  are 
unimportant.  Consideration  of  the  effects  of  all  revelant 
factors  would  be  beyond  the  scope  of  this  report. 


Training  simulators  may  consist  of  several  subsystems  which 
interact  with  each  other.  Since  each  subsystem  may  contain 
hundreds  of  indicators  and  gauges,  it  can  be  expensive  to 
construct  and  run  such  a  simulator.  Therefore,  the  question  of 
how  to  efficiently  utilize  simulators  becomes  important.  This 
problem  has  been  investigated  in  Section  B. 


A  distinction  between  two  types  of  training  is  made  in 
Section  C.  The  state-of-the-art  on  simulator  training  is  also 
described.  Then  "training  effectiveness"  and  "fidelity  level" 
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are  discussed.  These  two  problems  are  discussed  in  almost  every 
study  of  simulator  training.  The  evaluation  of  training 
effectiveness  of  simulators  in  terms  of  the  experimental  paradigm 
commonly  used,  measurement,  criticism  and  modification  is  also 
provided  in  Section  C. 

Section  D  describes  the  issue  of  simulator  fidelity, 
including  its  definition,  relationships  with  training, 
measurement  and  components.  Finally,  potential  research 
approaches  to  training  are  described  in  Section  E. 


B.  OVERALL  REVIEW  OF  SIMULATOR  TRAINING 


Two  types  of  simulator  training  can  be  identified.  One  is 
training  for  system  operation  and  the  other  is  training  lor 
maintenance,  i.e.,  fault  diagnosis  or  troubleshooting,  repair, 
and  tests  to  assure  normal  operation. 

In  training  for  operation,  the  trainees  do  not  know  how  to 
operate  the  system  before  training  begins.  However,  they  may 
possess  some  basic  knowledge  about  the  system  operation.  For 
example,  a  training  simulator  for  a  Boeing  747  aircraft  may  be 
designed  under  the  assumption  that  the  trainees  already  know  how 
to  fly  other  types  of  airplane.  However,  nothing  in  the  B-747 
simulator  should  be  left  out  solely  on  the  basis  of  trainees 
having  flown  other  aircraft.  Although  operations  under  normal 
conditions  are  usually  implied,  this  type  of  training  could,  and 
perhaps  should,  involve  operations  under  abnormal  conditions  or 
degraded  mode. 

In  training  for  maintenance,  the  trainees  must  have  learned 
to  operate  the  system  under  normal  conditions.  Hence,  this  type 
of  training  can  be  thought  of  as  forming  the  second  stage  of  a 
training  program.  In  the  following  discussion,  greater  emphasis 
will  be  placed  on  fault  diagnosis  or  equivalently  described  as 
troubleshooting. 


This  distinction  is  important  since  the  characteristic 
differences  between  them  result  in  different  approaches. 

TRAINING  FOR  OPERATION 

Training  for  operation  emphasizes  visual-motor  coordination 
types  of  task,  such  as  steering  a  vehicle,  or  flying  an  airplane. 
The  physical  layout,  environmental  factors,  handling  quality, 
visual  and  motion  cues,  scenic  view,  and  vibration  are  all 
reported  to  be  influential  factors  on  training  effectiveness 
(Semple  et  al.  1981,  Martin  and  Waag  1978).  Relatively  high 
simulator  fidelity  is  generally  provided  for  this  type  of 
training  (Baum  et  al  1982),  although  the  required  degree  of 
fidelity  is  not  known.  Expensive  mock-ups  are  widely  used  for 
training  of  this  type.  However,  less  expensive  equipment  such  as 
three-dimensional  computer  graphics  simulators  and  simulators 
with  computer-generated  imagery  (Forbus  and  Stevens  1981)  have 
been  investigated  as  substitutes  for  mock-ups.  The  target  task 
is  relatively  well  understood  and  therefore  the  training 
objectives  are  usually  well  defined.  The  transfer  effects  are 
sometimes  difficult  to  determine  due  to  the  cost  and  risk  of 
operating  the  real  system. 

TRAINING  FOR  TROUBLESHOOTING 

In  training  for  troubleshooting  greater  emphasis  is  placed 
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on  the  acquisition  of  procedural  or  cognitive  skills,  such  as 
failure  detection,  fault  diagnosis,  problem  solving,  decision 
making  and  information  seeking.  Some  empirical  evidence  has  been 
accumulated  to  justify  the  use  of  low  fidelity  simulators  for 
this  purpose  (Crawford  and  Crawford  1978) .  It  is  argued  by  most 
researchers  that  the  cognitive  nature  of  the  problems  instead  of 
visual-motor  coordination  is  more  important.  The  target  task  is 
relatively  difficult  to  define. 

In  a  fault  diagnosis  situation,  i<-  is  possible  that  some 
failures  and  their  causes  may  not  be  known  in  advance.  It  is 
infeasible  to  train  the  operator  for  all  cases.  The  objective  of 
training  of  this  type  is  therefore  the  acquisition  of  general 
diagnostic  ability.  In  other  words,  the  rationale  of  using  a 
fault  diagnosis  simulator  for  training  is  that  general  diagnostic 
ability  can  be  developed  through  the  exposure  to  specific 
diagnostic  experiences.  It  is  thus  assumed  that  the  learning  of 
many  similar  fault  diagnosis  tasks  in  a  simulator  results  in  the 
gradual  development  of  the  problem  solving  ability  for  the 
simulated  system.  The  transfer  effect  is  difficult  to  determine 
due  to  the  lack  of  suitable  metrics  for  cognitive  skills  as  well 
as  practical  limits  on  one's  ability  to  present  realistic 
troubleshooting  problems  for  the  purpose  of  measuring  transfer  of 
training. 


STATE-OF-THE-ART 


Simulator  training  has  been  studied  extensively  since  World 
War  II.  Gagne  (1954)  summarized  the  research  up  to  1954  and 
pointed  out  the  problems  and  future  research  directions. 
Twenty-seven  years  later,  Hennessy  (1981)  indicated  that  research 
on  simulator  training  since  Gagne  has  done  very  little  to  improve 
our  understanding.  Most  of  the  outstanding  research  issues  were 
the  same  as  those  pointed  out  by  Gagne.  Many  studies  were 
conducted  to  evaluate  the  effectiveness  of  particular  pieces  of 
equipment.  Cost  effectiveness  analysis  was  another  highly 
investigated  area.  Training  strategy  and  instructional  methods 
were  investigated  broadly.  Several  performance  measures  have 
been  developed  and  used.  Hennessy  (1981)  presented  a  summary  of 
the  current  research  issues  on  simulator  training.  After  a 
relatively  extensive  literature  search,  a  modified  and  extended 
list  to  his  original  presentation  was  compiled  and  is  shown 
below. 

1.  Training  Strategy  : 

-  Adaptive  or  fixed  amount  of  training 

(Freedy  and  Lucaccini  1981) 

-  Self-paced  or  fixed  schedule 

-  Optimal  use  of  simulator  (Weitz  and  Adler  1973) 

-  Total  information  or  withheld  information 

(Duncan  and  Shepherd  1975) 


2.  Instructional  Methods  : 

-  Role  of  the  instructor 

involved  and  directive  or  provide  error 
feedback  only 

-  Instructor  model  (McCauley  et  al.  1982) 

-  Knowledge  of  results 

error  or  accuracy 
augmented  or  intrinsic 

-  Learning  situation 

team  training  or  individual  training 
(Eggemeir  and  Cream  1978) 
learning  style 


3.  Training  for  Normal  Operation  : 

-  Evaluation  of  device  effectiveness 

(Finley  et  al.  1978) 

-  Whole  or  part  training 

can  complex  skills  be  trained  separately? 

are  components  separable? 

are  they  learned  at  different  rates? 

-  Retention  of  training  (Goldberg  et  al.  1981) 

-  Effectiveness  of  a  particular  factor 

visual  cues,  motion  cues,  vibration  etc. 
(Semple  et  al.  1981) 


4.  Maintenance  and  Procedural  Training  : 


W 


-  Evaluation  of  device  effectiveness 

(Fink  and  Shriver  1978) 

-  Retention  of  training  (Johnson  1981) 

-  Model  of  problem-solving  (Rouse  1981) 

-  Development  of  system  (Johnson  and  Path  1983) 

-  Affective  factors  (Morris  and  Rouse  1983) 

-  Aiding  (Lintern  1980) 

5.  Training  Device  Design  : 

-  Design  guidelines  (Van  Cott  and  Kinkade  1972) 

-  Device  requirement  and  characteristics  (Miller  1974) 

-  New  devices  and  approaches  (Levin  and  Fletcher  1981) 

-  Use  of  microcomputer  (Crawford  and  Crawford  1978) 

6.  Performance  Measure  : 

-  What  to  measure 

measures  of  problem  solving  performance 
(Henneman  and  Rouse  1984) 

criterion-referenced  measure  (Swezey  1978) 

-  Reliability  and  validity  (Goldstein  1978) 

-  How  to  measure 

formulas  for  transfer  of  training 
(Hammerton  1977) 

rating  (Cooper  and  Drinkwater  1971) 
transfer  function  (Matheny  1978) 


Predictive  index 
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measurement  of  fidelity  (Narva  1977) 

-  Task  analysis 

basis  for  fidelity  measurement  (Hays  1981) 
basis  for  device  requirement 
(Wheaton  et  al.  1976) 

7.  Cost-effectiveness  of  Training  Devices  : 

-  Cost  effectiveness  analysis 

(Orlansky  and  String  1981) 

8.  Methodology  Consideration  : 

-  Transfer  of  training 

in-simulator  transfer  of  training 
(Westra  1981) 
criticism  (Adams  1979) 

Training  effectiveness  is  the  major  concern  of  most  of  the 
research  mentioned  above.  The  selection  of  training  strategy  and 
instructional  method,  design  of  training  devices,  determination 
of  cost  effectiveness,  and  adoption  of  suitable  predictive 
indices  are  based  on  the  measurement  of  training  effectiveness. 
To  determine  cost  effectiveness,  measures  for  training 
effectiveness  and  cost  are  needed.  These  in  turn  are  based  on 
different  types  of  data  and  measurement  methodologies.  Even 
though  cost  effectiveness  is  the  most  important  factor  when  a 
decision  on  the  procurement  of  simulators  must  be  made,  such 


decisions  will  not  be  considered  here  since  they  are  beyond  the 
scope  of  this  report. 

The  next  section  will  discuss  the  issues  concerned  with 
evaluation  of  the  training  effect,  including  the  paradigm, 
performance  measures,  modification  to  the  paradigm  and  criticisms 
of  the  paradigm. 


C.  EVALUATION  OF  TRAINING  EFFECTIVENESS 


Several  methods  have  been  proposed  to  assess  the  effects  of 
simulator  training  upon  operator  performance  in  a  real  system. 
These  methods  have  included  comparison  with  a  control  group  (in  a 
transf er-of-training  experiment) ,  comparison  with  a  model  of 
optimal  performance/  and  subjective  ratings.  However,  only 
transfer  of  uraining  and  subjective  ratings  have  been  widely  used 
and  studied.  Transfer  of  training  experiments  may  be  costly  to 
conduct  but  the  result  is  definitive.  Subjective  ratings  are 
easier  to  collect  but  the  result,  being  subjective,  may  not  be 
definitive  with  regard  to  actual  effectiveness.  Rating  studies 
are  widely  used  to  predict  the  effectiveness  of  a  simulator  when 
an  emperical  data  base  is  not  available,  while  transfer  of 
training  studies  are  used  to  estimate  the  observed  effectiveness 
of  a  simulator. 

TRANSFER  OF  TRAINING 

Transfer  of  training  is  an  old  issue  in  psychology.  Gagne 
et  al.  (1948)  conducted  a  comprehensive  review  of  the 
measurement  of  transfer  of  training  used  by  experimental 
psychologists.  Murdock  (1957)  outlined  the  paradigms  used  by 
transfer  experiments.  He  pointed  out  that  some  means  of 
comparing  the  amount  of  transfer  resulting  from  distinct  measures 
were  important.  Osgood  (1949)  investigated  the  transfer  effect 


in  a  stimulus-response  context.  In  his  studies,  subjects  were 
first  taught  to  associate  a  specific  stimulus  with  a  specific 
response.  Subjects  were  then  tested  on  other  stimulus-response 
pairs  which  might  deviate  from  the  original  one.  Osgood  reported 
that  the  similarity  between  the  tested  S-R  pairs  and  the  original 
S-R  pair  could  affect  the  transfer  effect.  He  proposed  a 
"transfer  surface",  based  on  which  two  conclusions  could  be 
drawn: 

(1)  When  stimuli  are  varied  and  responses  are  identical, 
positive  transfer  is  obtained. 

(2)  When  stimuli  are  identical  and  responses  are  varied, 
negative  transfer  is  obtained. 

In  other  words,  the  degree  of  similarity  between  the  stimuli  and 
between  the  responses  determines  the  positive  or  negative 
transfer.  The  motivation  of  using  a  simulator  as  a  training 
device  is  the  hope  that  positive  transfer  of  training  can  be 
elicited.  Therefore  studies  of  transfer  of  training  from  the 
simulator  to  the  real  system  have  long  been  used  to  evaluate  the 
effectiveness  of  a  training  device.  The  paradigm  commonly  used, 
the  performance  measures  and  their  drawbacks,  and  modifications 
are  presented  in  the  following  paragraphs. 

lh£  Paradigm 

Valverde  (1973)  presented  a  comprehensive  review  of  transfer 
experiments  conducted  with  aircraft  during  1949-1971.  Finley  et 


al.  (1972),  Meister  et  al .  (1971),  and  Ryan  et  al .  (1972)  have 

conducted  a  number  of  studies  on  evaluating  the  effectiveness  of 


naval  training  devices.  Most  of  the  transfer  experiments 
reviewed  were  based  on  the  same  paradigm  which  is  depicted  below. 


simulator  real  system 

training  training 


experimental  group  yes 


yes 


control  group  no 


yes 


The  experimental  group  went  through  two  sections  of  training. 
The  first  section  was  the  simulator  training  while  the  second 
section  was  the  real  system  experience.  The  control  group 
experienced  only  the  second  section.  Both  sections  were 
considered  as  complete  after  stable  performance  above  some 
criterion  was  demonstrated.  The  performance  of  both  groups  in 
the  second  section  was  then  compared  to  see  if  training  in  the 
first  section  influenced  the  training  in  the  second  section. 
Conventionally  the  second  section  was  conducted  using  a  real 
system. 


Performance  Measure 


Transfer  of  training  effects  are  usually  measured  in  two 
ways  (Hammerton  1967):  (1)  savings  measure,  and  (2)  first-shot 
measure.  The  savings  measure  determines  the  reduction  of  the 
training  efforts  in  the  second  section.  The  first-shot  measure 


evaluates  the  performance  of  the  trainee  on  the  first  trial  of 
the  second  section. 

(1)  Savings  measures 

The  performance  measure  adopted  widely  is  the  percent 
transfer  based  on  improvement  in  performance  on  the  real  system. 
The  following  formula  is  used  extensively  (Micheli,  1972). 
percent  transfer  =  (c  -  e) 100/c 
where : 

c  =  performance  or  time  of  the  control  group  on  the 
real  system  to  achieve  some  criterion 
e  =  performance  or  time  of  the  experimental  group  on 
the  real  system  to  achieve  some  criterion 
Roscoe  (1971)  argued  that  the  transfer  measure  is  more 
meaningful  if  the  time  spent  on  the  simulator  is  also  considered. 
Therefore  he  developed  the  Transfer  Effectiveness  Ratio  (TER) . 
TER  is  a  measure  for  assessing  the  effectiveness  of  a  simulator 
by  expressing  the  savings  in  time  on  the  real  system  as  a 
function  of  the  time  in  the  simulator.  It  is  defined  as  time 
saved  in  the  transfer  task  over  the  time  required  in  the 
simulator.  Therefore, 

TER  =  (c  -  e)/te 
where 

c  *  time  to  reach  some  criterion  on  the  real  system  by 
control  group 


e  =  corresponding  value  for  experimental  group 
te  =  time  experimental  group  spent  on  simulator 

(2) .  First-shot  measures 

The  performance  measurement  problem  is  complicated  by  the 

fact  that,  in  some  practical  situations,  the  "control  group"  data 

are  not  available  or  the  simulated  task  is  more  difficult  than 
the  real  task.  The  first  shot  measures  can  be  employed  to  solve 
these  problems.  Hammerton  (1967)  discussed  four  of  them.  The 
following  notations  were  used  (see  Figure  1). 

F  :  initial  error  score  on  the  simulator  for  the 

experimental  group 

T  :  initial  error  score  on  the  real  system  for  the 

experimental  group 

C  :  initial  error  score  on  the  real  system  for  the 

control  group 

L  :  error  score  after  stable  performance  on  the 
simulator  for  experimental  group 

S  :  error  score  after  stable  performance  on  the  real 
system  for  control  group 

The  following  measure  assesses  how  much  training  was 
retained  on  first  transferring  to  the  real  system. 

percent  of  training  retained  =  (F-T) 100/ ( F-L) 


Figure 
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1  :  Curves  showing  form  of  typical  transfer  experiment  (modified 


from  Hammerton  1967) 
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For  most  purposes  this  is  entirely  satisfactory.  However, 
sometimes  the  simulated  task  is  harder  than  the  real  one. 
Therefore  S  may  differ  significantly  from  L.  Also  F  would  be 
significantly  larger  than  C.  In  such  a  case,  this  formula  can 
make  the  simulator  appear  more  effective  than  it  really  is. 
Hence,  comparisons  of  first-shot  transfer  with  the  stable 
preformance  of  the  control  group  is  preferred. 


percent  of  training  retained  =  (C-T) 100/ (C-S) 


Note  that  in  this  formula  C,  S  and  T  are  measured  on  the  real 
system.  The  role  of  the  simulator  is  only  expressed  indirectly 
in  T  which  is  the  error  score  of  the  first  trial  on  the  real 
system  after  stable  performance  has  been  raached  on  the 
simulator.  Another  way  to  solve  this  problem  is  to  measure  how 
much  learning  is  retained  at  transfer  compared  with  that  which 
the  experimental  group  would  have  required  to  reach  the  stable 
performance  level  of  the  control  group. 


percent  of  training  retained  =  (F-T) 100/ (F-S) 


The  last  measure  shows  how  first-shot  transfer  differs  from  the 
stable  performance  of  the  control  group. 


percent  of  deviation  =  (1-T/S)100 


Appropriate  measures  should  be  selected  with  caution  so  that 
no  inferences  are  based  on  weak  measures.  Use  of  these  two  types 


of  measure  (i.e.,  savings  and  first-shot)  is  not  without 
limitation.  Hammerton  (1977)  noted  that  these  two  classes  of 
measures  really  dealt  with  different  things.  High  savings 
measures  did  not  imply  high  first-shot  measures  automatically  and 
vice  versa. 

Criticisms  and  Modif ication 

Although  the  transfer  of  training  methodology  is  widely  used 
it  is  not  without  flaws.  As  a  matter  of  fact,  the  problems  of 
using  a  transfer  of  training  measure  to  assess  the  effectiveness 
of  a  simulator  are  significant.  Several  researchers  h-'.„e  pointed 
out  the  drawbacks  and  have  proposed  remedial  procedures. 

Mudd  (1968)  pointed  out  that  the  transfer  approach  is  not 
applicable  in  those  situations  where  the  system  being  simulated 
is  not  yet  operational  or  where  the  system  is  so  complex  that  it 
would  be  disastrous  to  use  an  untrained  control  group.  Another 
disadvantage  of  the  transfer  approach  is  that  generalization  to 
new  systems  is  not  possible,  so  each  new  system  needs  a  transfer 
study  to  determine  its  effectiveness. 

Reviewing  the  effectiveness  of  flight  simulators,  Adams 
(1979)  claimed  that  there  are  two  reasons  why  it  is  hard  to  find 


a  suitable  transfer  study. 

(1)  The  cost  of  a  transfer  experiment  for  the  simulator  of 
an  advanced  aircraft  is  high. 

(2)  The  transfer  experiment  is  simply  unsuited  for  advanced 
aircraft  because  it  is  hard  to  believe  that  the  control 

group  -  without  prior  training  on  the  new  advanced 

aircraft  -  can  be  allowed  to  fly. 

He  argued  that  a  simulator  need  not  necessarily  be  tested  if  it 
is  based  on  reliable  scientific  laws  and  the  success  of  other 
systems  based  on  the  same  laws  has  been  high. 

Blaiwes,  Puig  and  Regan  (1973)  maintained  a  similar  view  on 
transfer  of  training  as  a  measure  for  the  effectiveness  of 

training  devices  for  military  usage.  They  claimed  that  the 

difficulties  of  adopting  transfer  measures  include:  (1)  the 
dangers  in  employing  a  no-training  control  group,  (2)  the 

difficulties  in  specifying  appropriate  performance  measures  and 
criterion  levels,  (3)  the  problems  in  specifying  appropriate 
training  goals  and  the  need  for  task  analyses,  (4)  the  problems 
of  recording  performance  measures  in  training  and  operational 
environments,  and  (5)  the  confounding  of  variables  in  training 
and  transfer  situations  due  to  an  inability  to  exercise 

experimental  control. 

To  circumvent  these  difficulties  involved  in  using  transfer 
of  training,  they  suggested  the  four-level  evaluation  procedure 
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which  was  proposed  by  Jeantheau  (1971).  The  first  level  of 
evaluation  is  a  qualitative  assessment.  It  involves  examining 
the  procedures  used  for  training  in  terms  of  specified 
objectives,  and  examining  the  device  design  in  terms  of  the 
degree  to  which  these  procedures  can  be  implemented.  The  second 
level  of  evaluation  involves  measurement  of  trainee  performance 
from  the  beginning  of  training  to  the  end  of  training.  This  type 
of  assessment  is  not  comparative,  in  the  sense  that  performance 
measured  in  the  trainer  is  not  compared  with  alternative  methods 
of  training.  Level  three  involves  comparative  measurement.  To 
insure  comparability,  evaluations  should  be  conducted  in  a  way 
such  that  comparisons  are  made  between:  practice  vs. 
nonpractice,  different  training  methods  or  different  devices. 
Level  four  is  transfer  of  training  as  depicted  before. 


Duncan  and  Shepherd  (1975)  criticized  the  transfer  of 
training  study  as  inadequate  to  assess  the  training  effectiveness 
for  fault  diagnosis  behavior.  Infrequent  and  irregular  occurence 
of  failures  make  it  difficult  to  measure  the  transfer  effect  of 
fault  diagnosis  training.  Duncan  and  Shepherd  tended  to  think  of 
the  detection  of  each  individual  failure  as  a  separate  task 
requiring  training.  This  is  different  from  viewing  fault 
diagnosis  as  a  single  task.  Suppose  the  failure  is  "Heat 
Exchange  Pump  Stops".  The  trainees  had  to  learn  how  to  identify 
this  failure  and  take  remedial  actions.  However,  Duncan  and 
Shepherd  argued  that  one  cannot  wait  until  "Heat  Exchange  Pump 
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Stops"  happens  in  a  real  system  in  order  to  test  the 
effectiveness  of  simulator  training.  Hence,  Duncan  and  Shepherd 
claimed  that  a  transfer  of  training  study  is  not  an  adequate 
method  to  assess  the  effectiveness  of  diagnosis  training. 

Shepherd  (1977)  argued  that  instead  of  measuring  the 
transfer  effect  of  a  whole  simulator,  a  measure  along  each 
fidelity  dimension  should  be  more  appropriate.  For  example,  how 
does  color,  size,  or  panel  layout  affect  training  effectiveness? 
How  does  temporal  fidelity  affect  the  strategies  adopted  by  the 
trainee?  Unfortunately,  no  empirical  data  were  provided.  Also, 
there  still  is  a  problem  of  how  to  measure  each  fidelity 
dimension. 

Johnson  and  Rouse  (1982),  discussed  the  transfer  of  fault 
diagnosis  ability  from  simulators  to  a  live  system.  Instead  of 
regarding  each  failure  as  a  task  like  Duncan  and  Shepherd  did, 
they  treated  fault  diagnosis  behavior  as  a  whole.  Transfer  of 
training  was  then  investigated  to  compare  the  effectiveness  of 
different  training  methods.  In  their  study,  the  fault  finding 
problems  on  the  live  system  were  safer  and  less  expensive  to 
manipulate  than  those  claimed  by  Duncan  and  Shepherd  to  occur 
infrequently  and  irregularly. 

Conventionally,  the  transfer  effect  is  measured  on  the  real 
system,  which  in  some  sense,  is  just  a  perfect  mockup.  However, 


there  are  many  difficulties  in  simulating  the  psychological 
factors  that  have  been  reported  to  influence  the  operator's 
decision  making.  Realizing  these  implicit  difficulties  and  the 
cost  of  conducting  the  transfer  experiments  with  the  real  system, 
one  variation  has  been  tried  without  training  on  the  real  system: 
within-simulator  metrics  of  transfer  of  training. 

Shepherd  et  al.  (1977)  adopted  within-simulator  metrics  to 
measure  training  effectiveness.  The  trainees  were  trained  and 
tested  using  the  same  simulator.  Shepherd  et  al.  collected  a 
set  of  sixteen  failures  and  separated  them  into  two  groups  of 
eight  failures  each.  The  subjects  were  trained  on  one  group  of 
failures  and  tested  on  the  other  group.  All  experimental 
manipulations  were  conducted  within  the  same  simulator. 

Westra  (1981)  also  adopted  the  within-simulator  transfer  of 
training  paradigm  in  his  study  of  carrier  landing.  The  subjects 
were  trained  under  various  conditions  and  then  tested  under  a 
standard  condition  that  represented  maximum  realism.  This 
approach  permitted  a  relatively  large  number  of  variables  to  be 
studied.  Among  the  variables  investigated,  three  most 
significant  factors  were  chosen  for  a  further 
simulator-to-real-system  transfer  of  training  study.  Thus  the 
within-simulator  transfer  study  was  used  as  a  selection  tool  for 
features  to  be  included  in  a  further  more  costly  and  difficult 
transfer  study. 


Summary 


Transfer  of  training  studies  have  long  been  used  to  assess 
simulator  effectiveness.  The  paradigm  involves  providing  an 
experimental  group  with  experience,  first  on  a  simulator,  and 
then  on  the  real  system.  The  control  group  is  trained  only  on 
the  real  system.  Performance  on  the  real  system  for  both  groups 
is  compared  to  determine  the  effectiveness  of  the  simulator. 
Time  savings  measures  and  first-shot  measures  are  the  most 
commonly  used  performance  measures.  In  training  for  normal 
operation,  the  transfer  of  training  studies  are  not  adequate  for 
measuring  the  effectiveness  of  training  devices  unless  a  control 
group  can  be  formed  appropriately.  In  training  for  fault 
diagnosis,  if  each  failure  is  viewed  as  a  separate  task,  the 
transfer  of  training  studies  are  not  suitable  for  measuring 
effectiveness  since  the  failure  may  occur  infrequently  and 
irregularly  in  the  real  system.  However,  if  fault  diagnosis 
ability  is  viewed  as  a  somewhat  context-independent  ability,  then 
the  study  of  the  transfer  of  training  is  more  meaningful. 

Within-simulator  transfer  of  training  has  been  used  as  a 
substitute  to  the  conventional  paradigm.  The  performance  of  both 
control  group  and  experimental  group  is  measured  on  the 
simulator.  No  real  system  performance  is  involved.  This  method 
is  especially  useful  for  measuring  training  effectiveness  for 
fault  diagnosis  tasks. 
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In  general,  transfer  of  training  studies  are  adopted  most 
often  to  assess  the  transfer  effect,  though  some  modifications 
may  be  necessary.  There  are,  however,  certain  situations  to 
which  the  transfer  of  training  method  cannot  be  applied.  Other 
methods  such  as  ratings  should  be  adopted  to  ensure  appropriate 
measure  of  training  effectiveness. 


RATING 

Ratings  have  been  used  widely  in  the  evaluation  of  flight 
simulators.  Typically,  the  raters  are  experienced  users  of  the 
actual  system.  They  go  through  the  training  program  and  gain  a 
general  impression  of  the  simulator  before  they  rate  the 
effectiveness  of  the  tested  simulator  according  to  some  scale. 
Ratings  are  also  used  extensively  for  prediction  of  the 

effectiveness  of  training  devices.  Caro  (1970)  ,  Wheaton  et  al 
(1976) ,  and  Narva  (1977)  relied  on  ratings  as  the  basis  of  a 
predictive  model  of  the  effectiveness  of  training  devices. 
Raters  were  asked  to  assess  the  training  technique,  physical  and 
functional  similarity  and  the  learning  deficit.  Scores  were 
derived  from  ratings.  Those  scores  were  then  transformed  into  a 
global  index  which  was  used  to  predict  the  training 

effectiveness. 

Ratings  are  usually  accomplished  through  appropriate  use  of 
scales.  The  validity  and  the  reliability  of  the  scales  used  are 


seldom  verified  due  to  cost.  Two  examples  of  such  scales  are 
presented  in  the  following  paragraphs. 


Rating  Scales 


A  six-point  rating  scale  used  by  Gerlach  et  al  (1975)  in 
their  study  of  landing  simulation  is  reproduced  below. 


Rati np 

Adjective 

Description 

1. 

Excellent 

Virtually  no  discrepancies  between  real 

systems  and  simulators 

2. 

Good 

Very  minor  discrepancies 

3. 

Fair  + 

Simulator  is  representative  of  the  parent 

system 

4. 

Fair  - 

Simulator  needs  work 

5. 

Bad 

Simulator  is  not  representative 

6. 

Very  bad 

Possible  simulator  malfunction 

Class  1  through  class 

3  are  deemed  as  satisfactory  while  class  4 

through  class  6  are  unsatisfactory. 

A  sequential  pilot-rating  decision  scale  was  proposed  by 
Cooper  and  Harper  (1971).  This  was  a  ten-point  scale  which 
guided  pilots  through  the  estimation  process  by  identifying  3 
major  characteristics:  controllability,  acceptability  and 
satisfaction.  The  raters  began  with  controllability.  If  the 
simulator  was  not  controllable  then  it  was  rated  10.  Otherwise  a 
check  was  made  to  see  if  it  was  acceptable.  If  unacceptable,  the 


simulator  was  rated  7,  8  or  9  according  to  its  deficiencies.  If 
it  was  acceptable  but  not  satisfactory,  then  the  simulator  was 
rated  4,  5  or  6  according  to  the  identified  drawbacks.  If  it  was 
satisfactory,  then  it  was  further  rated  into  1 ,  2  or  3  according 
to  its  features.  The  scale  is  summarized  below: 

1.  Excellent,  highly  desirable 
Satisfactory  2.  Good,  negligible  deficiencies 

3.  Fair,  some  mildly  unpleasant  deficiencies 


4.  Minor  but  annoying  deficiencies 
Acceptable  5.  Moderately  objectionable  deficiencies 

6.  Very  objectionable  but  tolerable  deficiencies 


7.  Adequate  performance  not  attainable  with 
maximum  pilot  compensation 

Controllable  8.  Considerable  pilot  compensation  is  required  for 

control 

9.  Intense  pilot  compensation  is  required  to 
retain  control 


Uncontrollable  10.  Lost  control 


Although  the  acceptance  of  simulator  ratings  for  inference 
about  the  training  value  is  convenient  and  economic,  Adams  (1979) 


pointed  out  eight  problems  with  them. 

(1)  A  big  difficulty  is  the  underlying  assumption  that  the 
amount  of  transfer  of  training  is  positively  related  to 
the  rated  similarity  between  simulator  and  the  real 
system.  Raters  have  a  tendency  to  report  higher 
transfer  effect  for  simulators  with  higher  fidelity. 
However,  several  researchers  have  shown  that  high 
fidelity  does  not  necessarily  imply  high  transfer  rate. 
For  example,  Johnson  (1981)  found  that  training  devices 
do  not  need  to  be  of  high  fidelity  to  be  effective  in 
training  procedural  tasks. 

(2)  There  is  evidence  that  ratings  are  a  function  of  the 

amount  of  experience  of  the  raters.  Meshier  and  Butler 
(1976)  reported  an  experiment  in  which  experienced  and 
inexperienced  pilots  were  both  asked  to  rate  the 
usefulness  of  an  F4  simulator.  Both  groups  went 

through  the  same  training  procedures  before  providing 
the  ratings.  Twenty-eight  per  cent  of  the  experienced 
pilots  rated  it  as  "excellent"  and  sixty  per  cent  of 
the  same  group  rated  it  as  "good".  However, 

sixty-eight  per  cent  of  the  inexperienced  pilots  rated 
it  as  "excellent"  and  only  eighteen  per  cent  of  that 


group  rated  it  as  "good". 

(3)  Experience  in  the  simulator  affects  the  ratings. 
Gerlach  et  al .  (1975)  reported  that  the  pilot's  rating 
of  simulator  fidelity  improves  with  experience  in  the 
simulator . 

(4)  The  dimensions  of  a  simulator  interact  so  that  the 
rating  of  one  dimension  is  affected  by  the  presence  of 
another . 

(5)  Raters  have  difficulty  distinguishing  human  skill 
deficiencies  from  the  deficiencies  of  the  simulator. 

(6)  It  is  not  necessarily  true  that  a  positive  correlation 
exists  between  ratings  and  flying  performance  in  the 
simulator . 

(7)  Other  factors  affecting  training  effectiveness  are  not 
included,  e.g.,  instructors'  role  and  training 
syllabus . 

(8)  Experienced  users  of  the  real  system  may  not  be 
appropriate  raters  for  the  training  devices. 

Summary 

Rating  is  an  overall  judgement  of  similarity  between  the 
responses  experienced  in  the  simulator  and  a  memory 
representation  of  the  responses  experienced  in  the  real  system. 
It  has  been  used  widely  in  the  evaluation  of  flight  simulators. 
Several  scales  were  proposed.  Two  of  them  were  discussed  here. 


Gerlach  et  al.  adopted  a  six-point  scale,  while  Cooper  and 
Harper  used  a  ten-point  sequential  decision  scale.  Rating  is 
easy  to  conduct  and  inexpensive  to  implement.  It  can  be  done 
before  the  training  program  or  even  before  a  working  simulator  is 
available.  Therefore,  in  considering  a  predictive  index  for  the 
effectiveness  of  the  simulator,  it  is  more  useful  than  the 
transfer  of  training  approach.  However,  Adams  pointed  out  a  few 
problems  with  the  rating  approach.  The  major  drawback  is  their 
subjectivity. 


D.  SIMULATOR  FIDELITY 


DEFINITIONS 


"Fidelity"  has  been  used  widely  and  diversely  in  the 
simulator  training  community.  Different  people  use  the  term  with 
different  meanings.  Hays  (1981)  reviewed  the  literature  and 
noted  the  diversity  of  meaning.  He  further  found  that  most 
researchers  contrasted  physical  fidelity  with  non-physical 
fidelity.  It  is  non-physical  fidelity  that  attracted  a  variety 
of  names  and  definitions.  Functional  fidelity,  psychological 
fidelity,  task  fidelity  and  behavioral  fidelity  (Hays  1980)  are 
among  the  names  used.  In  general,  most  researchers  agree  that 
physical  fidelity  is  not  the  only  factor,  nor  the  main  factor, 
affecting  training  effectiveness.  There  is  also  general 
agreement  that  higher  fidelity  (assuming  it  can  be  measured)  is 
not  necessary  for  every  aspect  of  every  kind  of  training. 

There  appears  to  be  a  lack  of  research  activity  on  simulator 
fidelity,  and  of  an  appropriate  definition  of  what  is  meant  by 
fidelity.  After  reviewing  several  attempts  to  define  simulator 
fidelity,  Hays  (1981)  proposed  the  following  definition  : 


Training  simulator  fidelity  is  the  degree  of  similarity 
between  the  training  simulator  and  the  equipment  which 
is  simulated.  It  is  a  two  dimensional  measurement  of 


this  similarity  in  terms  of:  the  physical 
characteristics  of  the  training  simulator,  and  the 
functional  characteristics  of  the  simulated  equipment. 

Rouse  (1982)  defined  fidelity: 

"the  precision  with  which  the  simulator  reproduces  the 
appearance  and  behavior  of  the  real  equipment." 

These  two  definitions  are  very  similar.  They  emphasize  that 
fidelity  is  a  two  dimensional  concept.  They  also  pointed  out  the 
measurement  problems.  Tasks  and  the  responses  of  the  trainees 
were  not  explicitly  considered. 

According  to  Kinkade  and  Wheaton  (1972),  the  fidelity  of  a 
simulator  consists  of  three  different  components:  (1)  equipment 
fidelity  (2)  environment  fidelity,  and  (3)  psychological 
fidelity.  Equipment  fidelity  is  defined  as  the  degree  to  which 
the  simulator  duplicates  the  appearance  and  "feel"  of  the  real 
system.  Environmental  fidelity  is  concerned  with  the  degree  to 
which  the  simulator  duplicates  the  sensory  stimulation,  e.g., 
dynamic  motion  cues,  visual  cues,  etc.  Psychological  fidelity  is 
simply  the  degree  to  which  the  trainee  perceives  the  simulator  as 
a  duplicate  of  the  real  system.  Equipment  fidelity  is  actually 
what  Hays  defined  as  physical  fidelity,  while  the  environmental 
fidelity  and  the  psychological  fidelity  together  approximate  his 
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Govindaraj  (1983)  proposed  a  three-dimensional  approach  in 
which  further  descriptions  and  measurements  along  each  dimension 
are  discussed. 


(1)  Physical  fidelity  : 

Physical  fidelity  is  concerned  with  the  variables 
presented  and  the  forms  they  take  as  well  as  the 
environmental  factors  such  as  noise,  vibration  and 
thermal  conditions.  Techniques  from  syntactic 
pattern  recognition  are  proposed  to  measure 
physical  fidelity. 

(2)  Structural  fidelity  : 

Structural  fidelity  refers  to  the  relationships 
between  subsystems.  Level  of  abstraction,  coupling 
of  system  states,  and  aggregation  of  subsystems  are 
the  primary  concerns.  Graph  theoretic  methods  are 
proposed  for  measurement. 

(3)  Dynamic  fidelity  : 

Dynamic  fidelity  refers  to  the  evolution  of  system 
states  over  time  and  their  presentation  to 
trainees.  Control  theoretic  methods  are  proposed 


for  measurement. 


This  definition  appears  to  be  relatively  comprehensive  and 
especially  useful  for  describing  the  fidelity  of  simulators  of 
large  complex  systems  such  as  power  plant  control  rooms.  Tasks 
and  trainees'  feedback  are  not  considered.  Non-physical  fidelity 
is  decomposed  into  structural  fidelity  and  dynamic  fidelity. 
This  provides  a  way  to  analyze  and  measure  the  functional  aspects 
of  a  simulator. 

Despite  the  rigorous  attempts  to  define  simulator  fidelity, 
one  must  keep  in  mind  that  training  effectiveness  is  the  main 
concern.  If  high  fidelity  does  not  imply  high  transfer  of 
training,  then  fidelity  is  not  a  useful  concept.  As  pointed  out 
by  Rouse  (1982),  the  key  issue  in  the  use  of  simulators  is  the 
level  of  fidelity  necessary  to  assure  transfer  of  training  from 
simulators  to  real  equipment.  The  study  of  simulator  fidelity 
can  help  clarify  the  following  questions. 

1) .  What  are  the  variables  affecting  the  feeling  of 

realism? 

2)  .  What  is  learned  and  in  what  way? 

3)  .  Can  a  criterion  for  simulator  design  be  found? 

4)  .  What  is  the  relationship  between  each  dimension  of 

fidelity  and  transfer  of  training  ?  Or,  does  any 
meaningful  relationship  exist  between  these  two? 


An  empirically  sound  definition  of  fidelity  is  necessary  if 


any  further  study  of  fidelity  is  anticipated.  It  may  not  be 
possible  to  have  a  general  index  of  fidelity  for  design  purposes. 
Nevertheless,  an  explicitly  expressed  and  commonly  accepted 
definition  is  required  for  comparison  of  fidelity  between 
different  simulators. 

RELATIONSHIP  WITH  TRAINING 

A  hypothetical  relationship  among  fidelity,  transfer,  and 
cost  was  proposed  by  R.  B.  Miller  (1954)  (see  Figure  2) .  Very 
little  empirical  data  have  been  collected  to  explore  this 
relationship.  According  to  Miller,  an  increase  in  the  degree  of 
simulator  fidelity  is  accompanied  by  increases  in  both  transfer 
of  training  and  cost.  The  objective  both  for  simulator  design 
and  the  use  of  a  simulator  for  training,  is  to  find  the  optimal 
point  of  intersection  between  fidelity,  transfer  and  cost  in  each 
case.  One  problem  with  Miller's  formulation  is  that  the  cost  of 
a  simulator  could  go  to  infinity  as  its  fidelity 
increases (Orlansky  1984).  Another  problem  is  the  explicit 
assumption  that  the  amount  of  transfer  increases  with  increasing 
fidelity  of  the  simulator  (Micheli  1972)  .  Many  researchers  have 
found  that  comparable  training  results  may  be  obtained  with  both 
low-  and  high-fidelity  simulators  of  the  same  equipment  (Duncan 
and  Shepherd  1975,  Crawford  and  Crawford  1978,  Johnson  1981).  In 
a  study  by  Martin  and  Waag  (1978)  ,  it  was  shown  that  flight 
simulators  with  higher  fidelity  provided  too  much  information  for 
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novice  trainees  and  actually  detracted  from  simulator 
effectiveness.  Prophet  (1966)  reported  a  study  that  compared  a 
low  fidelity  simulator  (inexpensive  photographic  mock-up  of  a 
cockpit)  with  that  of  an  elaborate  trainer.  No  significant 
difference  between  groups  was  found.  Despite  these 
counterexamples,  Miller's  approach  is  cited  widely  (Fink  and 
Shriver  1978,  Kinkade  and  Wheaton  1972,  Hays  1981). 

A  reformulation  of  Miller's  view  has  been  proposed  by 
Orlansky  (Orlansky  1984).  Even  though  Orlansky's  hypothetical 
model  is  not  fully  supported  by  empirical  data,  the  known  facts 
about  the  cost  of  simulators  and  about  the  relationship  between 
transfer  of  training  and  fidelity  have  been  accounted  for  in  the 
model . 


Kinkade  and  Wheaton  (1972)  have  proposed  a  hypothetical 
relationship  between  the  degree  of  simulator  fidelity,  types  of 
simulator  fidelity  and  the  stages  of  learning  (see  Figure  3). 
Early  in  the  training  program  (procedure  training) ,  the  trainee 
cannot  benefit  from  high  degrees  of  either  physical  or 
environmental  fidelity.  However,  as  skill  is  acquired 
(familiarization  training) ,  there  are  requirements  for  increases 
in  both  physical  and  environmental  fidelity,  with  the 
requirements  for  greater  environmental  fidelity  increasing  at  a 
faster  rate.  During  later  stages  of  training  (skill  training), 
increases  in  both  types  of  fidelity  are  desirable,  with  a 


Figure  3  :  The  hypothetical  relationship  among  degree  of 


simulation  (fidelity)  and  stage  of  learning 
(Kinkade  and  Wheaton,  1972) 


requirement  for  higher  levels  of  functional  fidelity. 

Johnson  (1981)  was  able  to  show  that  high  fidelity  is  not 
required  for  training  in  procedural  tasks.  Johnson  and  Rouse 
(1982)  reported  similar  results  for  fault  diagnosis  tasks. 
Govindaraj  (1983)  also  cast  doubt  on  the  necessity  of  high 
physical  fidelity  for  problem  solving  training.  Baum  (1981) 
pointed  out  that  empirical  data  to  support  Kinkade  and  Wheaton's 
conjecture  are  lacking  except  those  for  procedure  training. 
Baum,  Riedel  and  Hays  (1982)  conducted  a  study  to  determine  the 
relationship  between  training  device  fidelity  and  transfer  of 
training  for  a  perceptual -motor  maintenance  task.  The  results 
indicate  that  physical  similarity  is  a  significantly  more 
important  determinant  of  skill  acquisition  than  functional 
similarity.  These  experiments  provide  some  support  for  Kinkade 
and  Wheaton's  proposal. 

Fink  and  Shriver  (1978)  made  a  point  similar  to  that  made  by 
Kinkade  and  Wheaton.  They  identified  four  training  stages:  (1) 
acquisition  of  enabling  skills  and  knowledge  (2)  acquisition  of 
uncoordinated  skills  and  unapplied  knowledge  (3)  acquisition  of 
coordinated  skills  and  ability  to  apply  knowledge  and  (4) 
acquisition  of  job  proficiency.  They  claimed  that  different 
stages  require  different  levels  of  fidelity  with  the  first  stage 
requiring  the  lowest  level. 


G.G.  Miller  (1974)  drew  the  following  conclusions  about  the 
relationship  between  fidelity  and  training. 

(1)  High  fidelity  is  never  associated  with  poor 
training. 

(2)  Transfer  of  training  is  more  a  function  of  how  the 
simulator  is  used  rather  than  the  degree  of 
fidelity. 

(3>  Procedural  task  training  does  not  require  high 
fidelity. 

Conclusions  two  and  three  are  shared  by  many  other  researchers, 
while  conclusion  one  is  doubtful  as  pointed  out  before. 

No  consensus  has  been  reached  on  the  relationships  between 
fidelity  and  other  factors  such  as  cost,  training,  and  stage  of 
learning.  The  research  in  this  area  is  not  very  conclusive.  The 
difficulty  of  measuring  fidelity  is  part  of  the  reason  for  the 
slow  progress.  The  next  section  discusses  the  problems  and  the 
alternatives  for  the  measurement  of  fidelity. 

MEASUREMENT  OF  FIDELITY 

The  measurement  of  fidelity  is  an  important  step  if  one 
wishes  to  determine  empirically  the  relation  between  level  of 
fidelity  and  training  effectiveness  as  well  as  the  necessary 
fidelity  level  of  a  simulator  for  training  for  a  given  task. 
Specific  transfer  of  training  studies  are  possible  only  after 


both  the  simulator  and  the  actual  equipment  have  been  built. 
Nevertheless,  there  is  a  need  to  be  able  to  predict  the 
effectiveness  of  the  training  device  prior  to  construction. 
Considering  the  tremendous  cost  and  man-hours  involved  in 
developing  simulators  of  any  fidelity  level,  one  cannot  be 
satisfied  with  a  post  hoc  measure.  A  measure  of  fidelity  that 
correlates  with  the  measure  of  transfer  of  training  is  a  useful 
system  design  guide.  Therefore,  the  purpose  of  measuring 
fidelity  is  the  hope  that  a  predictive  index  can  be  devised  for 
anticipating  the  effectiveness  of  a  training  simulator.  A 
reliable,  predictive  index  of  the  effectiveness  of  a  simulator 
will  be  very  useful  both  for  trainers  and  design  engineers. 
Other  things  being  equal,  such  as  user  acceptance  and  required 
levels  of  funding,  they  can  then  choose  only  those  features  that 
possess  high  transfer  value  and  still  meet  the  training 
objective.  However,  in  practice  this  is  very  hard  to  achieve  due 
to  the  difficulty  of  measuring  simulator  fidelity.  One  of  the 
difficulties  is  the  lack  of  generality  of  such  a  measure. 
Govindaraj  (1983)  pointed  out  that  the  environment  and  the 
purpose  for  which  the  simulator  is  to  be  used  have  a  strong 
influence  on  fidelity.  Also,  fidelity  appears  to  be  very 
context-specific.  Therefore,  it  may  be  difficult  to  derive 
context-free  measures  of  fidelity. 

Wheaton  et  al .  (1976)  assessed  simulator  fidelity  on  two 
dimensions:  physical  fidelity  and  functional  fidelity.  They 
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discussed  the  metrics  of  fidelity  in  the  context  of  constructing 
a  model  to  predict  training  device  effectiveness.  In  their 
approach,  a  thorough  task  analysis  of  the  target  system  and  the 
simulator  was  conducted.  Subtasks  of  the  target  system  and  the 
simulator  were  then  clearly  identified.  The  physical  fidelity, 
for  each  subtask,  between  the  real  system  and  the  simulator  was 
evaluated  by  rating  with  a  scale  that  ranged  from  "no 
resemblance’,  "dissimilar",  "similar"  to  "identical".  The 
functional  fidelity  was  evaluated  by  recording  the  operator's 
behavior  in  terms  of  the  information  flow  from  each  display  to 
the  operator,  and  from  the  operator  to  each  control.  For  each 
subtask,  the  type,  amount,  and  direction  of  information  was 
assessed  using  information-theoretic  methods.  Then  a  four-point 
scale  was  applied  by  comparing  the  information  metrics  between 
the  real  system  and  the  simulator  on  each  subtask. 

The  underlying  assumption  was  that  the  higher  the  rating  on 
the  assessment  factors,  the  higher  the  transfer  that  would  take 
place  and  the  more  effective  the  simulator.  However,  as  pointed 
out  by  Adams  (1979) ,  rating  is  very  subjective  and  its 
reliability  is  questionable.  Further  refinement  of  this 
assessment  process  was  reported  by  Narva  (1977)  ,  in  which  the 
physical  fidelity  and  the  functional  fidelity  were  measured  by 
rating  with  emphasis  on  behavioral  categories  instead  of  the 
original  subtasks.  Some  of  the  behavioral  categories  used 
include  rule  learning  and  use,  detection,  symbol  identification, 


decision  making,  etc. 


Caro  (1970)  advocated  a  procedure  called  Equipment  Device 
Task  Commonality  Analysis  in  which  the  measurement  of  fidelity 
was  conducted  by  assessing  the  similarity  of  £-R  relationships  in 
the  real  system  and  the  simulator.  Positive  transfer  was  assumed 
to  occur  when  both  stimuli  and  responses  were  similar.  Negative 
transfer  was  predicted  when  the  stimuli  were  similar  but  the 
responses  were  different.  This  is  similar  to  what  Osgood  (1949) 
proposed.  The  assessment  of  the  similarity  was  also  accomplished 
through  rating.  This  procedure  applies  only  to  simulators  where 
the  stimuli  and  the  responses  can  be  clearly  identified.  In  a 
complex  system,  it  may  be  impossible  to  specify  the  stimuli 
clearly. 

COMPONENTS  OF  FIDELITY 


As  pointed  out  in  the  previous  discussion,  "fidelity"  is  a 
mutli-dimensional  concept.  An  operational,  comprehensive 
definition  may  be  difficult  to  obtain.  However,  the  building 
blocks  of  fidelity  have  been  widely  noted  and  studied  for  a  long 
time.  These  are  the  design  features  of  a  simulator.  Some  of 
them  are  discussed  below.  This  list  is  definitely  not 


exhaustive. 


There  is  no  doubt  that  stress  is  experienced  by  most 
operators  of  any  real  system.  However,  as  Duncan  and  Shepherd 
(1975)  pointed  out,  it  is  not  clear  how  or  if  stress  can  be 
simulated  during  training.  There  are  at  least  three  types  of 
stress.  First,  there  is  the  feeling  of  danger.  Creating  this 
type  of  stress  on  a  simulator  during  training  is  very  difficult. 
Second,  there  is  the  threat  of  hazard  or  sanction.  This  form  of 
stress  can  only  be  simulated  by  manipulating  reward  as  a 
consequence  of  performance.  Third,  there  is  time  stress.  This 
can  easily  be  introduced  into  the  training  task,  but  may  alter 
the  trainee's  perception  of  the  task.  Not  much  is  known  about 
how  to  incorporate  stress  into  simulator  training  or  if  its 
presence  contributes  to  adequate  training  (apart  from  user 
acceptance  or  irrelevant  opinion) . 

Environment 

Noise  is  distracting  especially  in  complex  tasks  that 
require  close  attention  and  concentration  (Finkelman  1975) . 
Improper  lighting  (Tinker  1943) ,  temperature  (Pepler  1972) ,  etc. 
degrade  human  performance.  However,  how  much  these  affect  the 
fidelity  level  or  how  much  they  contribute  to  the  training 
effects  is  a  matter  difficult  to  estimate.  While  noise, 


inappropriate  lighting,  and  temperature  may  degrade  general 


performance,  systematic  noise  or  unusual  heat  or  temperature  are 
repeatedly  reported  to  be  of  a  great  help  for  failure  detection 
and  diagnosis.  Many  trainees,  designers  and  experienced 
operators  admit  the  possibility  of  using  unusual  environmental 
changes  as  a  clue  to  detect  or  diagnose  the  failure.  Vibration 
has  been  given  the  same  appraisal  (Longman,  Phelan  and  Hansford 
1981,  McCallum  and  Rawson,  Jaspers  and  Hanley  1980,  Semple  et  al . 
1981,  Martin  and  Wagg  1978). 

Layout 

Panel  layout,  display  size  and  even  the  coloring  of 
instruments  are  considered  to  be  important  factors  that  affect 
the  feeling  of  realism.  More  important  is  the  relative  distance 
and  the  relative  position  between  gauges,  annunciators  and  status 
indicators  (Fowler  et  al .  1968).  Duncan  and  Shepherd  (1975) 
argued  that  the  trainees  may  develop  strategies  that  heavily 
depend  on  patterns  of  the  presented  stimuli.  The  size  of  the 
display  may  influence  the  amount  of  information  the  trainee  can 
process  at  any  one  time.  The  relative  distance  between  gauges 
and  the  relative  position  of  stimuli  may  affect  the  pattern 
recognition  process.  However,  Duncan  and  Shepherd  pointed  out 
that  the  influence  of  such  factors  is  unkown. 
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Wholeness 

A  full-scale  simulator  provides  all  aspects  of  system 
training,  while  a  part-task  simulator  presents  only  selected 
parts  of  the  full  system  to  the  trainees.  The  benefit  of  a 
part-task  trainer  is  that  some  particular  important  subsystem 
such  as  the  turbine  or  the  boiler  may  be  represented  with  greater 
physical  fidelity  and  provided  for  training  before  coping  with 
the  entire  system.  However,  the  functional  fidelity  may  be 
affected  due  to  the  isolation  of  a  particular  subsystem.  Curry 
(1981)  observed  that  detection,  diagnosis  and  remedial  action  are 
generally  assumed  to  be  three  separate  tasks.  Therefore, 
training  on  each  one  can  be  accomplished  independently  without 
too  much  trouble.  Rouse  (1981)  found  that  a  particular  logical 
judgement  process  is  especially  important  for  effective  fault 
diagnosis.  Abstracting  this  logical  process,  he  developed  a 
context-free  task,  TASK,  which  is  in  some  sense  a  decomposition 
of  the  fault  diagnosis  behavior.  He  demonstrated  positive 
transfer  of  training  from  TASK  to  a  real  system.  Rasmussen 
(1980)  proposed  a  criterion  for  the  decomposition  of  a  complex 
function.  He  obsered  that: 

"...break-down  of  complex  functions  is  only  acceptable 
if  the  performance  is  paced  by  the  system,  i.e.,  cues 
from  the  system  serve  to  initiate  elementary,  skilled 
sub-routines  individually  and  to  control  their 


sequence.  This  is  the  case  in  many  manual  tasks,  e.g., 


mechanical  assembly,  but  can  probably  also  be  arranged 
in  more  complex  mental  tasks  by  properly  designed 
interface  systems."  (p.  92) 


The  influence  of  the  part-task  trainer  on  complex  mental  tasks, 
such  as  fault  diagnosis  and  problem  solving,  is  not  yet  clearly 
understood.  However,  the  unverified  conjecture  is  that  wholeness 
is  not  a  crucial  fidelity  factor. 

Dynamics 

Most  real  systems  are  dynamic,  as  are  most  simulators. 
However,  static  simulators  have  been  used  increasingly  in  the 
past  few  years  (Duncan  and  Shepherd  1975,  Shepherd  et  al .  1977, 
Hunt  and  Rouse  1981,  Johnson  and  Rouse  1982).  Static  simulators 
only  allow  the  operators  to  check  the  system  status,  while 
dynamic  simulators  accept  control  commands  and  execute  them. 
There  is  no  doubt  that  dynamic  simulators  describe  the  object 
task  better  than  static  simulators  do,  but  how  much  better  is  a 
question  unanswered.  Forbus  and  Stevens  (1981)  indicated  that 
there  is  a  growing  amount  of  evidence  that  human  understanding  of 
physical  systems  is  based  on  qualitative  models  of  those  systems. 
This  evidence  comes  from  psychological  studies  (Larkin  et  al. 
1980)  and  is  supported  by  success  in  artificial  intelligence  in 
actually  constructing  systems  that  reason  about  physical 
situations  using  qualitative  models  (deKleer  1979,  Forbus  1980). 
Govindaraj  (1983)  proposed  a  qualitative  approach  to  modeling  a 


complex  dynamic  system.  This  approach  may  provide  a  way  to 
associate  the  level  of  dynamic  fidelity  with  an  explicit  training 
effect.  However,  there  is  no  empirical  data  to  support  the 
transfer  effect  of  the  qualitative  dynamic  simulator. 

Abstraction 

A  physical  system  can  be  represented  mentally  in  different 
forms  (Rasmussen  1979) .  Simulators  may  be  constructed  to 
represent  the  physical  system  at  different  levels  of  abstraction. 
On  the  bottom  of  the  hierarchy  is  the  realization  of  the  physical 
components  in  detail,  analogous  to  a  system  mock-up.  The  higher 
the  model  stands  in  the  hierarchy  by  aggregating  elements  into 
larger  units  or  by  abstracting  through  functional  properties,  the 
less  the  physical  fidelity  becomes.  A  system  block  diagram  is  an 
example  of  a  more  abstract  simulator.  Each  level  of  abstraction 
possesses  its  own  set  of  symbols  and  syntactic  rules.  Abstract 
simulators  may  be  more  effective  in  training  for  fault  diagnosis 
due  to  the  absence  of  irrelevant  cues.  Rasmussen  argued  that 
shifting  between  levels  of  abstraction  for  suitable  strategy  may 
be  helpful  for  problem  solving.  This  implies  that  training  under 
lower  physical  fidelity  and  higher  abstraction  level  may  transfer 
well  to  higher  physical  fidelity  and  lower  abstraction  situation. 
The  fact  that  diagnosis  can  be  viewed  as  a  top-down  process  may 
explain  why  lower  physical  fidelity  and  higher  abstraction  level 
simulators  could  perform  better  in  this  type  of  training. 
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Therefore,  functionally  speaking,  it  is  hard  to  decide  which  one 
has  higher  fidelity. 

The  value  of  simulators  of  different  abstraction  levels  may 
be  different  for  different  levels  of  trainees.  Kriessman  (1981) 
speculated  that  simulators  of  different  fidelity  level  may 
achieve  the  training  effect  differently.  A  high  fidelity 
simulator  is  good  for  more  experienced  trainees,  while  a  low 
fidelity  simulator  is  better  for  less  experienced  ones.  However, 
it  is  still  an  open  question  as  to  whether  the  use  of  simulators 
of  different  abstraction  levels  may  provide  the  operator  with 
different  skills  or  the  same  type  of  skills  but  in  a  degraded 
mode . 

State  VaridjPl.es 

Most  of  the  state  variables  in  a  real  system  are  presented 
in  a  continuous  manner  via  gauges  and  meters,  while  for 
simplification,  some  simulators  may  represent  the  state  variables 
in  discrete  language  such  as  high/medium/low  or  on/off. 
Internally,  the  human  processes  information  in  a  discrete  manner, 
especially  when  logical  reasoning  is  involved.  He  may  classify 
information  into  several  finite  sets.  Presenting  information  in 
a  discrete  manner  may  not  result  in  a  loss  of  information  as  long 
as  the  classifying  scheme  matches  the  human's  internal  model. 

The  increasing  use  of  CRTs  for  display  in  simulators 
introduces  difficulty  in  presentation  of  state  variables  because 
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of  size  constraints.  The  most  common  strategy  is  to  use  serial 
presentation  instead  of  parallel  presentation  which  is  the  usual 
way  information  is  transferred  to  the  operator  in  a  real  system. 
However,  considering  the  human  as  a  limited  information 
processor,  this  restriction  may  not  be  as  serious  a  fidelity 
problem  as  it  first  appears.  The  attention  span  for  human  beings 
is  well  known  to  be  narrow  and  varying  in  time.  The  state 
variables  in  a  real  system,  though  presented  simultaneously,  are 

possibly  processed  in  a  serial  manner - perhaps  chunk  by  chunk. 

However,  how  serial  presentation  of  state  variables  affect 
fidelity  may  depend  on  the  type  of  simulators  used. 

SUMMARY 

Several  attempts  have  been  made  to  define  "fidelity".  Hays 
(1981)  proposed  a  functional  fidelity  vs.  physical  fidelity 
approach.  Rouse  (1982)  suggested  a  similar  idea.  Govindaraj 
(1983) ,  oriented  toward  an  operational  definition,  decomposed 
functional  fidelity  into  structural  fidelity  and  dynamic 
fidelity.  Lack  of  empirical  studies  of  fidelity  issues  makes  it 
difficult  to  develop  a  useful  definition  of  fidelity.  A 
generally  accepted  assertion  is  that  higher  fidelity  does  not 
guarantee  better  transfer.  Kinkade  and  Wheaton  (1971) 
conjectured  that  the  fidelity  requirement  varies  with  the  stages 
of  learning.  Generally,  it  is  proposed  that  procedural  tasks  do 
not  require  as  high  a  fidelity  as  visual-motor  skills  do. 


Wheaton  (1976)  and  Narva  (1977)  developed  a  predictive  index 
for  transfer  effect  based  on  the  measurement  of  fidelity.  Task 
analysis  of  both  the  real  system  and  the  simulator  is  the 
foundation  for  measurement.  Rating,  so  far,  is  employed  in 
almost  every  fidelity  metric.  A  more  objective  metric  based  on 
system  characteristics,  and  perhaps  learner  state  only,  is  an 
important  future  research  topic. 

Several  factors  that  affect  fidelity  were  also  discussed.  A 
brief  summary  is  reproduced  below. 

(1)  It  is  very  difficult  to  include  stress  in  the  simulator. 

(2)  Environmental  factors  such  as  noise,  lighting,  temperature, 
motion  and  vibration  are  annoying  but  may  be  treated  as 
diagnostic  aids.  Inclusion  of  these  variables  does  increase 
fidelity,  but  the  cost-effectiveness  of  including  them  in  a 
simulator  has  long  been  challenged. 

(3)  Layout  may  affect  the  strategy  used  by  trainees. 

(4)  The  important  issue  in  the  use  of  part  task  simulators  is  the 
decomposibility  of  the  tasks. 

(5)  Dynamic  features  may  not  be  crucial  in  training  for  fault 
diagnosis.  Several  studies  indicated  that  the  human  reasons 
in  a  qualitative  rather  than  quantitative  way.  This  suggests 
an  important  research  topic. 

(6)  It  may  be  beneficial  to  vary  the  level  of  abstraction  of  the 
simulator  depending  upon  the  level  of  skill  of  the  trainee. 


(7)  in  a  real  system,  the  state  variables  are  presented 

simultaneously,  although  humans  may  not  be  able  to  process 
all  of  this  information  at  once.  As  a  limited  information 
processor,  human  operators  may  do  well  with  serial 
presentation  of  the  state  variables. 

Research  on  simulator  fidelity  is  geared  toward  better 
understanding  of  the  learning  process  and  the  construction  of  a 
predictive  index  of  transfer  effect.  These  as  well  as  other 
promising  research  topics  are  discussed  in  the  following  section. 


E.  FUTURE  RESEARCH 


A  majority  of  the  research  on  simulator  training  has 
concentrated  on  normal  operation.  However,  the  increasing  use  of 
automation  in  large  complex  systems  has  made  the  human  operator 
more  of  a  monitor  or  a  supervisor  who  only  interacts  with  the 
system  when  failures  occur.  This  tendency  results  in  the 
increasing  emphasis  on  fault  diagnosis  training.  Future  research 
on  simulator  training  is  largely  influenced  by  this  trend. 
Another  area  worth  noting  is  that  new  technologies  like 
videodiscs  and  computer  graphics  are  gradually  changing  the 
characteristics  of  simulator  training.  Klein  et  al.  (1978), 
Swezey  (1981),  and  Levin  and  Fletcher  (1981)  were  concerned  with 
these.  An  extension  of  what  they  presented  is  discussed  below. 
Studies  related  to  each  topic  are  supplied  when  available.  This 
is  not  intended  to  be  exhaustive  because  the  research  on 
simulator  training  is  multi-directional. 

NEW  TECHNOLOGY 

Advances  in  microprocessors,  videodiscs  and  computer 
graphics  have  led  to  drastic  changes  in  the  design  of  real 
systems  and  simulators.  Berman  (1981)  reported  that  General 
Electric's  Nuclenet  1000  control  system  uses  10  CRT's  to  replace 
as  many  as  75  percent  of  the  components  previously  used  on 
vertical  control  boards.  Kaplan  (1983)  depicted  a  venture  in 
which  a  nuclear-power-plant  malfunction  analyzer  was  built  by 


using  advanced  graphics  technology.  Levin  and  Fletcher  (1981) 


advocated  the  use  of  videodiscs  for  training.  The  benefits  of 
using  videodiscs  in  training  equipments,  they  claimed,  were  low 
cost  and  flexibility.  Videodiscs  actually  combine  the  advantages 
of  text,  slide,  movie,  audio  and  computer.  Bunderson  and 
Campbell  (1980)  discussed  some  of  the  problems  in  adopting 
videodiscs  as  training  equipment.  They  claimed  that  videodiscs 
are  not  well  suited  as  training  devices  at  the  current  stage  of 
development,  but  promise  to  be  useful  in  five  years. 


VALIDATION  OF  MODELS 

Most  of  the  models  and  guidelines  used  in  the  evaluation  of 
transfer  effects  are  theoretical  constructs.  Validation  and 
experimentation  are  required.  For  example,  Wheaton  et  al . 
(1976)  proposed  a  predictive  index  of  transfer  effects  based  on 
the  training  program  and  simulator  fidelity.  Though  modified 
later  by  Narva  (1977) ,  they  report  no  empirical  data  since  then. 

ACQUISITION  AND  DECAY  OF  TRAINING 

There  is  very  little  applicable,  quantitative  information 
available  on  learning  curves  and  learning  decay  (retention  of 
training)  for  different  types  of  task  and  training  method.  The 
impact  of  time  and  intensity  of  training  on  the  acquisition  of 
learning  is  a  critical  question  with  implications  for  cost  and 
cost-effectiveness.  Using  a  Thomas  table-top  collator,  model 
T-8,  Weitz  and  Adler  (1973)  showed  that  male  trainees  should  not 
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be  trained  beyond  the  point  at  which  they  have  reached  some 
minimal  criterion  of  performance.  Overtrained  male  trainees 
tended  to  develop  simulator-specific  habits  which  interfered  or 
became  dominant  factors  in  real  world  performance.  Aspects  of 
the  basic  learning  process  like  these  may  be  incorporated  into 
training  device  design  in  the  hope  that  the  transfer  effect  can 
be  increased  as  much  as  possible.  However,  very  little  is  known 
about  these  issues. 


INDIVIDUAL  DIFFERENCES 

There  is  an  increasing  awareness  that  training  devices  are 
most  successful  when  tailored  to  the  particular  "cognitive  style" 
and  "capabilities"  of  the  trainee.  The  ACTS  (Adaptive  Computer 
Training  Systems)  reported  by  Freedy  and  Lucaccini  (1981)  is  an 
attempt  in  this  direction.  A  utility  decision  model  is  employed 
to  estimate  the  "capabilities"  of  the  trainee.  Individualized 
instruction  is  then  given  to  the  trainee  based  on  the  result  of 
the  estimation. 


"Cognitive  style"  -  that  of  impulsivity-ref lectivity 

- was  reported  to  be  a  reasonable  predictor  of  errors  on  fault 

diagnosis  tasks  (Henneman  and  Rouse  1984) .  It  is  therefore 
reasonable  to  speculate  that  training  for  fault  diagnosis  tasks 
should  fit  one's  cognitive  style.  However,  very  little  research 
has  been  conducted  in  this  direction. 


SKILL  LEVEL  VARIANCES 

As  skill  levels  vary  across  trainees,  so  perhaps  should  the 
type  of  device  used.  Kriessman  (1981)  speculated  that  a  high 
fidelity  simulator  is  good  for  more  experienced  trainees,  while  a 
low  fidelity  simulator  is  good  for  less  experienced  trainees. 
This  is  the  assumption  that  underlies  the  proposal  of  a 
mixed-fidelity  approach  to  simulator  training  by  Rouse  (1982)  and 
Johnson  and  Futh  (1983).  The  effectiveness  of  training  of  this 
kind  remains  to  be  fully  verified. 


PERFORMANCE  MEASUREMENT 

Adequate  measures  for  human  problem  solving  performance  are 
the  basis  for  transfer  effect  experiments,  especially  those  on 
fault  diagnosis  training.  Henneman  and  Rouse  (1984)  have 
conducted  extensive  research  on  this  topic.  They  indicated  that 
there  are  only  three  unique  dimensions  of  performance:  errors, 
inefficiency  and  time.  In  addition,  cognitive  style  appears  to 
be  a  reasonable  predictor  of  performance.  How  well  these  metrics 
can  be  applied  to  types  of  training  other  than  fault  diagnosis  is 
not  yet  determined.  Also,  whether  these  variables  affect  the 
design  of  a  simulator  is  not  clear. 


DECISION  AIDS 

Decision  aids  in  a  training  simulator  help  trainees  learn 
efficiently.  However  they  may  not  reside  in  the  real  system. 
The  decision  aids  may  help  the  trainees  substantially  but  leave 


them  hopelessly  desperate  when  transferred  to  the  real  system 
because  of  the  unavailability  of  these  aids.  This  sort  of  aids 
should  "fade  out"  (Goodstein  1981)  before  transferred  to  the  real 
system.  How  and  when  to  fade  out  aids  is  an  idea  worth  pursuing. 

MENTAL  MODELS 

Mental  models  are  internal  representations  of  the  external 
environment.  They  can  assist  human  reasoning  by  producing 
explanation  or  justification  of  complex  system  behavior.  They 
are  powerful  analogical  devices  humans  use  in  learning  (Montague 
1981).  Landeweerd  (1979)  indicated  that  mental  models  probably 
played  an  important  role  in  fault  correction  and  in  the 
verification  process  in  diagnosing  faults.  Prather  (1973)  showed 
that  mental  practice  of  landing  the  T-37  aircraft  could  improve 
the  actual  performance.  However,  it  is  not  known  how  mental 
models  might  be  used  in  designing  training  equipment,  or  how  a 
mental  model  might  affect  the  learning  of  a  skill. 


F.  CONCLUSION 


Simulators  have  long  been  used  as  training  devices  due  to 
belief  in  their  cost  effectiveness  and  flexibility.  New 
technology  may  have  changed  the  characteristics  of  the  physical 
configurations  of  the  simulators.  However,  the  basic  problems  of 
using  simulators  for  training  still  remain.  Transfer  effect, 
fidelity  level  and  their  relation  to  cost  are  three  of  them. 

The  main  techniques  used  to  measure  the  transfer  effect  are 
transfer  of  training  and  ratings.  Transfer  of  training  research 
uses  a  fixed  paradigm  in  which  the  experimental  group  goes 
through  both  simulator  training  and  real  system  training  while 
the  control  group  experiences  only  the  real  system  training. 
Several  performance  measures  have  been  devised  to  assess  the 
effect  of  simulator  training  on  performance  on  the  real  system. 
They  car  be  classified  into  either  time  savings  measures  or 
first-shot  performance  measures.  Savings  measures  determine  the 
savings  of  training  efforts  on  real  systems.  The  first-shot 
measure  evaluates  the  performance  of  the  trainees  on  the  first 
trial  after  transferring  to  a  real  system. 

The  main  difficulty  in  using  transfer  of  training  is  that  it 
becomes  useless  if  no  control  group  can  be  formed.  This 
situation  occurs  frequently  in  training  for  normal  operation. 
Alternatives  such  as  in-simulator  transfer  of  training  have  been 
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proposed  and  used,  but  not  widely.  Another  problem  with  transfer 
of  training  research  is  the  unavailability  of  the  target  task  in 
the  real  system.  In  fault  diagnosis  tasks,  a  formal  transfer  of 
training  study  cannot  be  conducted  simply  because  of  the 
infeasibility  of  producing  the  fault  situations  in  the  real 
system.  A  modification  of  this  is  to  treat  fault  diagnostic 
behavior  itself  as  the  target  task  instead  of  specific  failures. 

Ratings  are  inexpensive  and  convenient  to  perform.  However, 
they  are  subjective  and  have  limited  reliability  and  validity. 
Nevertheless,  ratings  when  used  with  task  analysis,  are  the  basis 
of  a  predictive  model  for  simulator  training  effects.  Despite 
severe  theoretical  drawbacks,  ratings  are  still  adopted  widely  in 
practice. 

"Simulator  fidelity"  has  been  used  to  describe  how  the 
simulator  resembles  the  real  system.  It  is  generally  accepted 
that  both  physical  fidelity  and  non-physical  fidelity  are  factors 
which  influence  the  transfer  effect.  However,  there  is  no 
consensus  on  what  non-physical  fidelity  is.  To  reach  a  possible 
consensus  on  the  definition  of  simulator  fidelity,  a  thorough 
investigation  of  its  relationship  with  training  and  its 
components  is  required.  Most  of  the  frequently  described 
relationships  among  fidelity,  transfer,  and  cost  are 
hypothetical.  Very  little  empirical  data  have  been  collected  to 


support  these  supposed  relationships. 


However,  there  are  two 


assertions  about  the  relationship  between  fidelity  and  types  of 
task  that  are  supported  by  empirical  research.  They  are:  (1) 
perceptual-motor  coordination  tasks  require  higher  fidelity,  and 
(2)  procedural  task  training  does  not  require  high  physical 
fidelity . 

To  study  systematically  the  relationships  between  fidelity 
and  other  factors,  a  reliable  measure  of  fidelity  is  necessary. 
Most  of  the  measures  of  fidelity  are  based  on  task  analysis  and 
ratings.  These  measures  emphasize  the  human's  reaction  to  the 
system  instead  of  the  system  characteristics.  A  fidelity  measure 
based  on  the  system  characteristics  such  as  the  structure,  the 
dynamics,  etc.,  will  be  more  fruitful. 

The  components  of  fidelity  presented  are  those  that  affect 
training.  Human  factors  and  cognitive  psychology  points  of  view 
have  been  used  to  study  stress,  environment,  layout  and 
wholeness.  Their  influences  on  fidelity  are  relatively  obvious. 
How  level  of  abstraction,  dynamics  and  state  variables  affect 
simulator  fidelity  is  still  under  investigation.  In  the  context 
of  simulator  training  for  fault  diagnosis  tasks,  the  latter  three 
factors  are  receiving  more  attention. 

The  distinction  between  training  for  normal  operation  and 
training  for  fault  diagnosis  is  very  important.  Training  for 
normal  operation  emphasizes  visual-motor  coordination  tasks  while 


training  for  fault  diagnosis  is  cognitive-task  oriented. 
Increasing  use  of  automation  in  complex  systems  has  made  the 
latter  more  important.  This  trend  coupled  with  the  impact  of  new 
technologies  has  attracted  considerable  research.  Several  future 
research  topics  were  outlined,  including  validation  of  models, 
acquisition  and  decay  of  training,  individual  differences,  skill 
level  variance,  performance  measures,  decision  aids,  mental 
models  and  fidelity  effect. 

Simulator  training  is  becoming  more  and  more  important,  due 
to  the  increasing  trend  toward  large  complex  systems.  However, 
very  little  has  been  done  to  enhance  our  understanding  of  the 
factors  affecting  its  effectiveness.  This  report  has  tried  to 
piece  together  the  research  that  has  been  accomplished  so  far 
into  a  systematic  framework.  It  is  only  the  beginning  of  further 


research. 
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